Spatial Data Science Conference - 2023¶

image

Foundations of Geospatial¶

Danny Sheehan Climate Change Solutions Architect at AWS¶

Work:

  • AWS
  • Jupiter Intelligence
  • Mount Sinai Emergency Medicine
  • UBS Investment Bank
  • AutoCEQR
  • Carto
  • Columbia University
  • AECOM, etc.

Education:

  • Columbia University
  • University at Buffalo
  • Geneseo

Teaching:

  • Columbia University
  • Barnard College
  • Pratt Institute

Certifications:

  • Data Science (Columbia University, General Assembly)
  • AWS CCP, Prefect, etc.

LinkedIn: https://www.linkedin.com/in/daniel-martin-sheehan/

Dependencies¶

This Notebook uses https://rise.readthedocs.io/en/stable/ - use jupyter notebook to use, not lab :/

In [1]:
import warnings
warnings.filterwarnings('ignore')
# See README.md for all libraries and steps to create the environment. 

import pandas as pd
import geopandas as gpd
from shapely import wkt
import matplotlib.pyplot as plt
import rasterio
from rasterstats import zonal_stats

Geospatial Data Models¶

A data model is a way of defining and representing real world surfaces and characteristics in GIS.

There are two primary types of spatial data models: Vector and Raster.

Source: http://gsp.humboldt.edu/OLM/Courses/GSP_216_Online/lesson3-1/data-models.html

Vector¶

Vector data represents features as discrete points, lines, and polygons

Raster¶

Raster data represents features as a rectangular matrix of square cells (pixels)

image.png

Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf

TIN Data model¶

A triangulated irregular network (TIN)

is a data model commonly used to represent terrain heights - Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf

Typical use cases are for terrain inluts in Hydrologic Modeling / Watershed Modeling.

tin

Source: http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf

Vector Data Model¶

Vector data is very common, and is often used to represent features like roads and boundaries. Vector data comes in the form of points and lines that are geometrically and mathematically associated.

  • Points:
* One pair of coordinates defines the location of a point feature.

  • Polylines (LineStrings):
* Two or more pairs of coordinates that are connected define a line feature.
* A series of connected points

  • Polygons:
* Multiple pairs of coordinates that are connected and closed define a polygon feature.
* A series of connected points that loop back to the first point
  * Multiple "polygons" can exist in one layer
  * Polygons can have internal polygons or "holes"
  * The beginning and ending coordinates for a polygon are the same.

Source: http://gsp.humboldt.edu/OLM/Courses/GSP_216_Online/lesson3-1/data-models.html

Shapely¶

One dependency (another Python Library GeoPandas depends on) is Shapely

Shapely is a Python package for set-theoretic analysis and manipulation of planar features using (via Python’s ctypes module) functions from the well known and widely deployed GEOS library. GEOS, a port of the Java Topology Suite (JTS), is the geometry engine of the PostGIS spatial extension for the PostgreSQL RDBMS. The designs of JTS and GEOS are largely guided by the Open Geospatial Consortium’s Simple Features Access Specification 1 and Shapely adheres mainly to the same set of standard classes and operations. Shapely is thereby deeply rooted in the conventions of the geographic information systems (GIS) world, but aspires to be equally useful to programmers working on non-conventional problems.

Source: https://shapely.readthedocs.io/en/latest/manual.html

Vector¶

Points¶

In [3]:
from shapely.geometry import Point
point = Point(
    0.0, 
    0.0)

point
Out[3]:

Lines - (LineString)¶

In [4]:
from shapely.geometry import LineString
In [5]:
line = LineString([(0, 0), (2, 2)])
line
Out[5]:
In [6]:
a = LineString([(0, 0), (1, 1), (1, 2), (2, 2)])
b = LineString([(0, 0), (1, 1), (2, 1), (2, 2)])
In [7]:
a
Out[7]:
In [8]:
b
Out[8]:

Intersection of Two LineString Objects¶

In [9]:
x = a.intersection(b)
x
Out[9]:

Polygons¶

In [10]:
from shapely.geometry import Polygon
In [11]:
c = Polygon([[1, 1], [-1, 3], [3, 3], [3, 1]])
d = Polygon([[0, 0], [0, 4], [4, 4], [4, 1]])
In [12]:
c
Out[12]:
In [13]:
d
Out[13]:

Intersection of Two Polygons¶

In [14]:
x = c.intersection(d)
x
Out[14]:

MultiPoint¶

In [15]:
from shapely.geometry import MultiPoint
In [16]:
e = MultiPoint([(0, 0), (1, 1), (1, -1)])
e
Out[16]:
In [17]:
e.convex_hull  # minimum bounding geometry
Out[17]:

Pandas¶

pandas-img

Introduction to Pandas¶

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Source: https://pandas.pydata.org/pandas-docs/stable/index.html

If totally new to Pandas, you can think of Pandas as the table (.dbf) of a Shapefile or as a Sheet in Excel or Google Spreadsheet.

Pandas DataFrame¶

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input. Source: https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html

pandas_dataframe

Source: https://pandas.pydata.org/pandas-docs/stable/getting_started

Reading in a .csv into Pandas¶

In [18]:
csv_url = 'https://data.cityofnewyork.us/api/views/he7q-3hwy/rows.csv?accessType=DOWNLOAD'
In [44]:
df = pd.read_csv(csv_url)
In [45]:
# show a random sample of rows from the .csv data read in to the DataFrame object
df.sample(10)
Out[45]:
OBJECTID URL NAME the_geom LINE
86 1820 http://web.mta.info/nyct/service/ 33rd St & Queens Blvd at SW corner POINT (-73.9317939996055 40.74476300048575) 7
1277 1133 http://web.mta.info/nyct/service/ Broadway & 86th St at SE corner POINT (-73.97628800043398 40.78807300127442) 1
1488 1344 http://web.mta.info/nyct/service/ Lorimer St & Broadway at SW corner POINT (-73.9466129996701 40.70332100121268) J-M
1831 1687 http://web.mta.info/nyct/service/ NaN POINT (-73.99493799970026 40.725137000715904) B-D-F-M-6
293 149 http://web.mta.info/nyct/service/ 7th Ave & 53rd St at NE corner POINT (-73.98157800036606 40.762937001104106) B-D-E
1746 1602 http://web.mta.info/nyct/service/ 4th Ave & 45th St at NW corner POINT (-74.00960200017478 40.64954600111841) R
556 412 http://web.mta.info/nyct/service/ Church St & Chambers St at NE corner POINT (-74.00773899971288 40.71499200122226) A-C-E-2-3
438 294 http://web.mta.info/nyct/service/ 14th Ave & 61st St at NW corner POINT (-73.99687000021943 40.627110001144935) D-N
425 281 http://web.mta.info/nyct/service/ East 16 St & Avenue M at NW corner POINT (-73.95946000002303 40.618140000630724) B-Q
1149 1005 http://web.mta.info/nyct/service/ 6th Ave & 49th St at SE corner POINT (-73.98078800025752 40.75899600118979) B-D-F-M-7
In [46]:
# show the first 15 lines using .head()

df.head(15)
Out[46]:
OBJECTID URL NAME the_geom LINE
0 1734 http://web.mta.info/nyct/service/ Birchall Ave & Sagamore St at NW corner POINT (-73.86835600032798 40.84916900104506) 2-5
1 1735 http://web.mta.info/nyct/service/ Birchall Ave & Sagamore St at NE corner POINT (-73.86821300022677 40.84912800131844) 2-5
2 1736 http://web.mta.info/nyct/service/ Morris Park Ave & 180th St at NW corner POINT (-73.87349900050798 40.84122300105249) 2-5
3 1737 http://web.mta.info/nyct/service/ Morris Park Ave & 180th St at NW corner POINT (-73.8728919997833 40.84145300067447) 2-5
4 1738 http://web.mta.info/nyct/service/ Boston Rd & 178th St at SW corner POINT (-73.87962300013866 40.84081500075867) 2-5
5 1739 http://web.mta.info/nyct/service/ Boston Rd & E Tremont Ave at NW corner POINT (-73.88000500027815 40.840434000875874) 2-5
6 1740 http://web.mta.info/nyct/service/ Boston Rd & E Tremont Ave at NE corner POINT (-73.87983300021861 40.84035400111976) 2-5
7 1741 http://web.mta.info/nyct/service/ Boston Rd & 178th St at SE corner POINT (-73.8795549998336 40.84063900116436) 2-5
8 1742 http://web.mta.info/nyct/service/ Boston Rd & 178th St at NW corner POINT (-73.87939700013239 40.84107800066419) 2-5
9 1743 http://web.mta.info/nyct/service/ Boston Rd & 174th St at SW corner POINT (-73.88804799985908 40.83732500129732) 2-5
10 1744 http://web.mta.info/nyct/service/ Boston Rd & 174th St at NW corner POINT (-73.88775499967073 40.83759900080287) 2-5
11 1745 http://web.mta.info/nyct/service/ Southern Blvd & Freeman St at SW corner POINT (-73.8920200000541 40.82990300129103) 2-5
12 1746 http://web.mta.info/nyct/service/ Southern Blvd & Freeman St at NW corner POINT (-73.89201499957316 40.83036900118696) 2-5
13 1747 http://web.mta.info/nyct/service/ Westchester Ave & Simpson St at NE corner POINT (-73.8928969996593 40.824356000986995) 2-5
14 1748 http://web.mta.info/nyct/service/ Westchester Ave & Simpson St at NW corner POINT (-73.89311400018327 40.82417700086825) 2-5
In [47]:
df.shape
Out[47]:
(1928, 5)
In [48]:
df.value_counts('LINE')
Out[48]:
LINE
F          125
1          120
6          113
2-5         92
A           75
          ... 
A-H          1
A-FS         1
A-C-J-L      1
A-C-FS       1
e-F-M-R      1
Length: 97, dtype: int64

GeoPandas¶

geopandas-img

GeoPandas¶

In [60]:
from shapely import wkt

gdf = gpd.GeoDataFrame(
    df, geometry=df['the_geom'].apply(wkt.loads), 
    crs="EPSG:4326")

gdf.plot();
In [61]:
gdf.crs
Out[61]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [62]:
boro_geojson_link = 'https://data.cityofnewyork.us/api/geospatial/tqmj-j8zm?method=export&format=GeoJSON'
In [63]:
# read in geojson as geopandas GeoDataFrame
gdf = gpd.read_file(boro_geojson_link)
In [64]:
gdf.plot(figsize=(5, 5));  # nyc boros
In [65]:
gdf.head()
Out[65]:
boro_code boro_name shape_area shape_leng geometry
0 5 Staten Island 1623620725.06 325917.353702 MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ...
1 2 Bronx 1187174784.85 463179.772813 MULTIPOLYGON (((-73.89681 40.79581, -73.89694 ...
2 1 Manhattan 636520830.768 357564.316391 MULTIPOLYGON (((-74.01093 40.68449, -74.01193 ...
3 3 Brooklyn 1934143372.64 728197.541089 MULTIPOLYGON (((-73.86327 40.58388, -73.86381 ...
4 4 Queens 3041418506.76 888199.731579 MULTIPOLYGON (((-73.82645 40.59053, -73.82642 ...
In [66]:
# what is the coordinate reference system?

gdf.crs
Out[66]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

To change the Projection, we set the argument inplace=True¶

In [67]:
gdf.to_crs(epsg=2263, inplace=True)  
# changed the state of the projection systems
In [68]:
gdf.crs
Out[68]:
<Derived Projected CRS: EPSG:2263>
Name: NAD83 / New York Long Island (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.
- bounds: (-74.26, 40.47, -71.8, 41.3)
Coordinate Operation:
- name: SPCS83 New York Long Island zone (US Survey feet)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

Geoprocessing¶

Very broadly, Geoprocessing is any operation involving geospatial data or methods. The Geographic Information Systems (GIS) software company Esri refers to it as "Computing with geographic data."

It is commonly used interchangeably with the term Spatial Analysis.

  • However, Spatial Analysis includes the interpretation of the results of Geoprocessing.

Professor Jochen Albrecht defines Geoprocessing as;

...any GIS operation used to manipulate data. A typical geoprocessing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset, also referred to as derived data.

Common geoprocessing operations are geographic feature overlay, feature selection and analysis, topology processing, and data conversion.

Geoprocessing allows you to define, manage, and analyze geographic information used to make decisions. - Jochen Albrecht

Geoprocessing Library Selection¶

There are several Geoprocessing libraries and technologies. A few notable open source ones are;

  • PostGIS - Spatially Enabled PostgreSQL
  • CARTO - Carto Spatial Analysis
  • GeoPandas - Pandas Extended with Shapely
  • Arcpy - Esri's Python site package
  • PyQGIS - QGIS's Python Package

SQL¶

  • Carto Spatial SQL
  • PostGIS
  • AWS Redshift
  • Amazon Athena
  • Trino
  • etc.
SELECT
 ST_Intersects(a.the_geom, b.the_geom) AS the_geom
FROM
 a, b

ArcPy¶

ArcPy is a Python site package that provides a useful and productive way to perform geographic data analysis, data conversion, data management, and map automation with Python. Esri

Arcpy is for ArcGIS Desktop or ArcGIS Pro. The ArcGIS API for Python, using ArcGIS Online is also available for multiple platforms.

arcpy.analysis.Intersect([feature_1, feature_2], out_feature)

Geometric Manipulations¶

Buffer¶

Returns a GeoSeries of geometries representing all points within a given distance of each geometric object. Source: https://geopandas.org/geometric_manipulations.html

In [69]:
fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax)

gdf.buffer(5000).plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red', linewidth=3)

plt.title('Buffer');

Centroid¶

Returns a GeoSeries of points for each geometric centroid. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary

In mathematics and physics, the centroid or geometric center of a plane figure is the arithmetic mean position of all the points in the figure. Informally, it is the point at which a cutout of the shape could be perfectly balanced on the tip of a pin.[1]

The definition extends to any object in n-dimensional space: its centroid is the mean position of all the points in all of the coordinate directions.[2] Source: https://en.wikipedia.org/wiki/Centroid

centroid Source: https://en.wikipedia.org/wiki/Centroid

In [70]:
fig, ax = plt.subplots(figsize=(7, 7))
gdf.plot(ax=ax)

gdf.centroid.plot(ax=ax, facecolor="red", markersize=200)

plt.title('Centroid');

Convex Hull¶

Returns a GeoSeries of geometries representing the smallest convex Polygon containing all the points in each object unless the number of points in the object is less than three. For two points, the convex hull collapses to a LineString; for 1, a Point. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary

In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. For a bounded subset of the plane, the convex hull may be visualized as the shape enclosed by a rubber band stretched around the subset. Source: https://en.wikipedia.org/wiki/Convex_hull

In [71]:
fig, ax = plt.subplots(figsize=(6, 6))
gdf.plot(ax=ax)

gdf.convex_hull.plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red', linewidth=3)

plt.title('Convex Hull');

Envelope¶

Returns a GeoSeries of geometries representing the point or smallest rectangular polygon (with sides parallel to the coordinate axes) that contains each object. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary

In [73]:
fig, ax = plt.subplots(figsize=(7, 7))

gdf.plot(
    ax=ax, 
)

gdf.envelope.plot(ax=ax, facecolor="none", alpha=0.7,edgecolor='red', linewidth=3)

plt.title('Envelope');

Simplify¶

Returns a GeoSeries containing a simplified representation of each object. Source: https://geopandas.org/geometric_manipulations.html#GeoSeries.boundary

In [74]:
fig, ax = plt.subplots(figsize=(5, 5))
gdf.plot(ax=ax)

gdf.simplify(10000).plot(ax=ax, facecolor="none", alpha=0.7, edgecolor='red',  linewidth=3)

plt.title('Simplify');

Spatial Join¶

Add GeoPandas Spatial Join Examples

Set-Operations with Overlay¶

When working with multiple spatial datasets – especially multiple polygon or line datasets – users often wish to create new shapes based on places where those datasets overlap (or don’t overlap). These manipulations are often referred using the language of sets – intersections, unions, and differences. These types of operations are made available in the geopandas library through the overlay function.

The basic idea is demonstrated by the graphic below but keep in mind that overlays operate at the DataFrame level, not on individual geometries, and the properties from both are retained. In effect, for every shape in the first GeoDataFrame, this operation is executed against every other shape in the other GeoDataFrame. Source: https://geopandas.org/set_operations.html

overlay Source: https://geopandas.org/set_operations.html

These Relationships follow the DE-9IM model¶

The Dimensionally Extended nine-Intersection Model (DE-9IM) is a topological model and a standard used to describe the spatial relations of two regions (two geometries in two-dimensions, R2), in geometry, point-set topology, geospatial topology, and fields related to computer spatial analysis. The spatial relations expressed by the model are invariant to rotation, translation and scaling transformations. Source: https://en.wikipedia.org/wiki/DE-9IM

In [77]:
from IPython.display import IFrame

IFrame(src='https://en.wikipedia.org/wiki/DE-9IM', width=1000, height=600)
Out[77]:
In [78]:
import geopandas
from shapely.geometry import Polygon

polys1 = geopandas.GeoSeries([
    Polygon([(0,0), (2,0), (2,2), (0,2)]),
    Polygon([(2,2), (4,2), (4,4), (2,4)])])

polys2 = geopandas.GeoSeries([
    Polygon([(1,1), (3,1), (3,3), (1,3)]),
    Polygon([(3,3), (5,3), (5,5), (3,5)])])

df1 = geopandas.GeoDataFrame({'geometry': polys1, 'df1':[1,2]})
df2 = geopandas.GeoDataFrame({'geometry': polys2, 'df2':[1,2]})

ax = df1.plot(color='red', figsize=(7, 7));
df2.plot(ax=ax, color='green', alpha=0.5);

Union¶

The output feature class will contain polygons representing the geometric union of all the inputs as well as all the fields from all the input feature classes. See below for examples of how attribute values are assigned to the output features. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-union-analysis-works.htm

esri_union

In [80]:
res_union = geopandas.overlay(
    df1, 
    df2, 
    how='union')

res_union 
Out[80]:
df1 df2 geometry
0 1.0 1.0 POLYGON ((2.00000 2.00000, 2.00000 1.00000, 1....
1 2.0 1.0 POLYGON ((2.00000 2.00000, 2.00000 3.00000, 3....
2 2.0 2.0 POLYGON ((4.00000 4.00000, 4.00000 3.00000, 3....
3 1.0 NaN POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0....
4 2.0 NaN MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000...
5 NaN 1.0 MULTIPOLYGON (((2.00000 2.00000, 3.00000 2.000...
6 NaN 2.0 POLYGON ((3.00000 5.00000, 5.00000 5.00000, 5....
In [81]:
ax = res_union.plot(alpha=0.5, cmap='tab10', figsize=(7, 7))

df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');

Intersect¶

The Intersect tool calculates the geometric intersection of any number of feature classes and feature layers. The features, or portion of features, that are common to all inputs (that is, they intersect) will be written to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm

Diagrams below from: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm

Polygon inputs and polygon output¶

esri_intersect_poly_poly

Polygon inputs and line output¶

esri_intersect_poly_line

Polygon inputs and point output¶

esri_intersect_poly_point

Line inputs¶

esri_intersect_line_line

Line inputs and point output¶

esri_intersect_line_point

Point inputs¶

esri_intersect_point_point

Diagrams above from: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/how-intersect-analysis-works.htm

In [82]:
res_intersection = geopandas.overlay(
    df1, 
    df2, 
    how='intersection')
In [83]:
res_intersection
Out[83]:
df1 df2 geometry
0 1 1 POLYGON ((2.00000 2.00000, 2.00000 1.00000, 1....
1 2 1 POLYGON ((2.00000 2.00000, 2.00000 3.00000, 3....
2 2 2 POLYGON ((4.00000 4.00000, 4.00000 3.00000, 3....
In [84]:
ax = res_intersection.plot(cmap='tab10', figsize=(6, 6))

df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');

Symetrical Difference¶

Features or portions of features in the input and update features that do not overlap will be written to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/symmetrical-difference.htm

symetrical_difference

In [85]:
res_symdiff = geopandas.overlay(
    df1, 
    df2, 
    how='symmetric_difference')

res_symdiff
Out[85]:
df1 df2 geometry
0 1.0 NaN POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0....
1 2.0 NaN MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000...
2 NaN 1.0 MULTIPOLYGON (((2.00000 2.00000, 3.00000 2.000...
3 NaN 2.0 POLYGON ((3.00000 5.00000, 5.00000 5.00000, 5....
In [86]:
ax = res_symdiff.plot(cmap='tab10', figsize=(6, 6))

df1.plot(ax=ax, facecolor='none', edgecolor='k');
df2.plot(ax=ax, facecolor='none', edgecolor='k');

Difference¶

Seems most similar to Esri's Erase overlay function.

Creates a feature class by overlaying the Input Features with the polygons of the Erase Features. Only those portions of the input features falling outside the erase features outside boundaries are copied to the output feature class. Source: https://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/erase.htm

difference_erase

In [87]:
res_difference = geopandas.overlay(
    df1, 
    df2, 
    how='difference')

res_difference
Out[87]:
geometry df1
0 POLYGON ((2.00000 0.00000, 0.00000 0.00000, 0.... 1
1 MULTIPOLYGON (((3.00000 3.00000, 4.00000 3.000... 2
In [89]:
ax = res_difference.plot(cmap='tab10', figsize=(6, 6))

df1.plot(ax=ax, facecolor='none', edgecolor='k')
df2.plot(ax=ax, facecolor='none', edgecolor='k');

Dissolve¶

https://geopandas.org/aggregation_with_dissolve.html

In [90]:
countries = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
In [92]:
countries.plot(cmap='tab10', figsize=(10, 10));

Dissolve¶

In [93]:
countries = countries[['continent', 'geometry']]
In [94]:
continents = countries.dissolve(by='continent')
In [95]:
continents.plot(cmap='tab10', figsize=(10, 10));

Dissolve and Sum¶

In [96]:
countries = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
countries.head(14)
Out[96]:
pop_est continent name iso_a3 gdp_md_est geometry
0 889953.0 Oceania Fiji FJI 5496 MULTIPOLYGON (((180.00000 -16.06713, 180.00000...
1 58005463.0 Africa Tanzania TZA 63177 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982...
2 603253.0 Africa W. Sahara ESH 907 POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3 37589262.0 North America Canada CAN 1736425 MULTIPOLYGON (((-122.84000 49.00000, -122.9742...
4 328239523.0 North America United States of America USA 21433226 MULTIPOLYGON (((-122.84000 49.00000, -120.0000...
5 18513930.0 Asia Kazakhstan KAZ 181665 POLYGON ((87.35997 49.21498, 86.59878 48.54918...
6 33580650.0 Asia Uzbekistan UZB 57921 POLYGON ((55.96819 41.30864, 55.92892 44.99586...
7 8776109.0 Oceania Papua New Guinea PNG 24829 MULTIPOLYGON (((141.00021 -2.60015, 142.73525 ...
8 270625568.0 Asia Indonesia IDN 1119190 MULTIPOLYGON (((141.00021 -2.60015, 141.01706 ...
9 44938712.0 South America Argentina ARG 445445 MULTIPOLYGON (((-68.63401 -52.63637, -68.25000...
10 18952038.0 South America Chile CHL 282318 MULTIPOLYGON (((-68.63401 -52.63637, -68.63335...
11 86790567.0 Africa Dem. Rep. Congo COD 50400 POLYGON ((29.34000 -4.49998, 29.51999 -5.41998...
12 10192317.3 Africa Somalia SOM 4719 POLYGON ((41.58513 -1.68325, 40.99300 -0.85829...
13 52573973.0 Africa Kenya KEN 95503 POLYGON ((39.20222 -4.67677, 37.76690 -3.67712...
In [97]:
continents = countries.dissolve(
    by='continent', 
    aggfunc='sum')
continents
Out[97]:
geometry pop_est gdp_md_est
continent
Africa MULTIPOLYGON (((-11.43878 6.78592, -11.70819 6... 1.306370e+09 2455514
Antarctica MULTIPOLYGON (((-61.13898 -79.98137, -60.61012... 4.490000e+03 898
Asia MULTIPOLYGON (((48.67923 14.00320, 48.23895 13... 4.550277e+09 32725478
Europe MULTIPOLYGON (((-53.55484 2.33490, -53.77852 2... 7.454125e+08 21587850
North America MULTIPOLYGON (((-155.22217 19.23972, -155.5421... 5.837560e+08 25075988
Oceania MULTIPOLYGON (((147.91405 -43.21152, 147.56456... 4.120487e+07 1647113
Seven seas (open ocean) POLYGON ((68.93500 -48.62500, 69.58000 -48.940... 1.400000e+02 16
South America MULTIPOLYGON (((-68.63999 -55.58002, -69.23210... 4.270667e+08 3852015
In [98]:
continents.plot(
    column = 'pop_est', 
    scheme='quantiles', 
    cmap='YlOrRd', 
    figsize=(10, 10),
);

Merging Data¶

Spatial Joins¶

https://geopandas.org/mergingdata.html

In [99]:
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

cities = geopandas.read_file(geopandas.datasets.get_path('naturalearth_cities'))

countries = world[['geometry', 'name', 'pop_est', 'gdp_md_est']]
countries = countries.rename(columns={'name':'country'})
In [100]:
cities.plot(color='red');
In [101]:
countries.plot();
In [102]:
countries.head()
Out[102]:
geometry country pop_est gdp_md_est
0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... Fiji 889953.0 5496
1 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... Tanzania 58005463.0 63177
2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... W. Sahara 603253.0 907
3 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... Canada 37589262.0 1736425
4 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... United States of America 328239523.0 21433226
In [103]:
cities.head()
Out[103]:
name geometry
0 Vatican City POINT (12.45339 41.90328)
1 San Marino POINT (12.44177 43.93610)
2 Vaduz POINT (9.51667 47.13372)
3 Lobamba POINT (31.20000 -26.46667)
4 Luxembourg POINT (6.13000 49.61166)

Spatial Join to capture the Countries Data, specifically GDP, we want to "JOIN" Country GDP to the Cities¶

In [104]:
cities_with_country = geopandas.sjoin(
    cities, 
    countries, 
    how="inner", 
    op='intersects')

Let's look at the Spatially Joined Output¶

In [105]:
cities_with_country.head(20)
Out[105]:
name geometry index_right country pop_est gdp_md_est
0 Vatican City POINT (12.45339 41.90328) 141 Italy 60297396.0 2003576
1 San Marino POINT (12.44177 43.93610) 141 Italy 60297396.0 2003576
226 Rome POINT (12.48131 41.89790) 141 Italy 60297396.0 2003576
2 Vaduz POINT (9.51667 47.13372) 114 Austria 8877067.0 445075
212 Vienna POINT (16.36469 48.20196) 114 Austria 8877067.0 445075
3 Lobamba POINT (31.20000 -26.46667) 73 eSwatini 1148130.0 4471
16 Mbabane POINT (31.13333 -26.31665) 73 eSwatini 1148130.0 4471
4 Luxembourg POINT (6.13000 49.61166) 128 Luxembourg 619896.0 71104
9 Bir Lehlou POINT (-9.65252 26.11917) 2 W. Sahara 603253.0 907
10 Monaco POINT (7.40691 43.73965) 43 France 67059887.0 2715518
13 Andorra POINT (1.52659 42.51075) 43 France 67059887.0 2715518
186 Geneva POINT (6.14003 46.21001) 43 France 67059887.0 2715518
235 Paris POINT (2.35299 48.85809) 43 France 67059887.0 2715518
14 Port-of-Spain POINT (-61.51703 10.65200) 175 Trinidad and Tobago 1394973.0 24269
15 Kigali POINT (30.05859 -1.95164) 169 Rwanda 12626950.0 10354
17 Juba POINT (31.58003 4.82998) 176 S. Sudan 11062113.0 11998
18 The Hague POINT (4.26996 52.08004) 130 Netherlands 17332850.0 907050
192 Amsterdam POINT (4.91469 52.35191) 130 Netherlands 17332850.0 907050
19 Ljubljana POINT (14.51497 46.05529) 150 Slovenia 2087946.0 54174
20 Bratislava POINT (17.11698 48.15002) 152 Slovakia 5454073.0 105079

Let's look at the output Geometry¶

In [106]:
cities_with_country.plot(color='red');

Plotting our GDP values assigned to Cities as a Graduated Symbol¶

In [107]:
fig, ax = plt.subplots(figsize=(10, 4))

countries.plot(ax=ax, facecolor='gray')

cities_with_country.plot(
    ax=ax,
    column='gdp_md_est', 
    scheme='quantiles',
    legend=True,
    cmap='Greens',
    edgecolor='white',
    alpha=0.6,
    markersize=cities_with_country['gdp_md_est']/5000.,
    legend_kwds={
        "title": 'GDP',
        "loc": 4})

plt.title('Gross Domestic Product (GDP) - 2016', {'fontsize': 27});
plt.savefig('img/gdp.png')

Raster Data Model¶

Raster¶

[Rasters] are frequently represented as a continuous grid of square cells, each containing a value indicating the (estimated) average height or strength of the field in that cell. In most of the literature and within software packages the points/ lines/ areas model is described as vector data, whilst the grid model is described as raster (or image) data. Source: https://www.spatialanalysisonline.com/HTML/index.html

Raster Basics¶

A raster is a rectangular grid of pixels with values that can continuous (e.g. elevation) or categorical (e.g. land use). This data structure is very common - jpg images on the web, photos from your digital camera, and the jupyterhub icon are all rasters. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

Common properties of any raster:

number of rows and columns (sometimes referred to as lines and samples) data type (dtype, or bit depth) - e.g., 8-bit (2^8 possible values, 0-255) some kind of resolution information, often dots per inch (dpi) with raster graphics A geospatial raster is only different from a digital photo in that it is accompanied by metadata that connects the grid to a particular location. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

raster_data_model

Source: National Ecological Observatory Network (NEON) via: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

Examples of categorical rasters¶

Some rasters contain categorical data where each pixel represents a discrete class such as a landcover type (e.g., "coniferous forest" or "grassland") rather than a continuous value such as elevation or temperature. Some examples of classified maps include:

  • Landcover / land-use maps.
  • Snowcover masks (binary snow or no snow)

The following map shows elevation data for the NEON Harvard Forest field site. In this map, the elevation data (a continuous variable) has been divided up into categories to yield a categorical raster. Source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/0-introduction.ipynb

category_raster

Source: Homer, C.G., et al., 2015, Completion of the 2011 National Land Cover Database for the conterminous United States-Representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing, v. 81, no. 5, p. 345-354) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

Raster and Resolution¶

resolution of a raster represents the area on the ground that each pixel of the raster covers. The image below illustrates the effect of changes in resolution. Source: https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

raster_resolution

Source: National Ecological Observatory Network (NEON) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

Multispectral (multiband) Raster¶

A raster can contain one or more bands. One type of multi-band raster dataset that is familiar to many of us is a color image. A basic color image consists of three bands: red, green, and blue. Each band represents light reflected from the red, green or blue portions of the electromagnetic spectrum. The pixel brightness for each band, when composited creates the colors that we see in an image. Source: https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

multispectral_raster

Source: National Ecological Observatory Network (NEON) via https://datacarpentry.org/organization-geospatial/01-intro-raster-data/index.html

Let's compare a two maps, one derived by Vector Data and one derived from Raster Data¶

In [2]:
import ipyleaflet  # source: https://github.com/geohackweek/raster-2019/blob/master/notebooks/3-visualization-and-modis.ipynb
from ipyleaflet import Map, Rectangle, basemaps, basemap_to_tiles, TileLayer, SplitMapControl, Polygon
import ipywidgets
import datetime
import re
bbox = [43.16, -11.32, 43.54, -11.96]
west, north, east, south = bbox
bbox_ctr = [0.5*(north+south), 0.5*(west+east)]

m = Map(center=bbox_ctr, zoom=5)  # MODIS great for large areas (onl)
right_layer = basemap_to_tiles(basemaps.NASAGIBS.ModisTerraTrueColorCR, "2020-04-01")
left_layer = TileLayer()
control = SplitMapControl(left_layer=left_layer, right_layer=right_layer)
m.add_control(control)
m
Out[2]:
Map(center=[-11.64, 43.349999999999994], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_t…

Let's take a look at some very simple Raster Data model¶

In [3]:
import numpy as np
import pandas as pd 

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# let's allow our dataframe printouts in jupyter to be larger
In [4]:
simple_image = np.load('data/image_file.npy')   
# .npy is a numpy file storage format

What is this simple_image?¶

In [5]:
simple_image
Out[5]:
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,  11, 104, 159, 159, 232, 195, 102,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,  19, 154, 227, 254, 235, 174, 167, 233, 184,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0, 106, 217, 254, 239, 159,  23,   0,   0,  68, 221,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         72, 249, 251, 131,  11,   0,   0,  53,  13,   2,  40,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  26,
        243, 250,  72,   0,   0,   5, 184, 251, 103,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 152,
        254, 170,   0,   0,   0, 148, 254, 254,  85,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  21, 225,
        254,  41,   0,  30, 198, 253, 254,  88,   2,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  25, 231,
        254,  67,  22, 181, 251, 246, 108,  20,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  37, 241,
        254, 146, 240, 254, 238,  62,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  41, 226,
        254, 254, 254, 192,  21,   3,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  91, 229,
        254, 246, 120,  16,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  56, 248, 254,
        254, 190,  15,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,  92, 249, 229, 131,
        213, 254, 136,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  69, 233, 248,  65,   0,
        178, 254, 143,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   9,  73, 252, 248,  88,   0,   0,
         84, 249, 180,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  44, 254, 246,  50,   0,   0,  25,
        210, 247,  86,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,  25, 213, 254,  84,   0,   2,  82, 223,
        254, 203,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,  63, 254, 254,   4,  33, 168, 254, 254,
        176,   5,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,  31, 225, 254, 227, 240, 254, 235, 133,
          9,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,  42, 193, 254, 254, 222,  29,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]], dtype=uint8)

Plotting this image data in Pandas aas an .ndarray like a DataFrame?¶

In [7]:
pd.DataFrame(simple_image).head(28)
Out[7]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 104 159 159 232 195 102 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 154 227 254 235 174 167 233 184 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 106 217 254 239 159 23 0 0 68 221 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 72 249 251 131 11 0 0 53 13 2 40 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 26 243 250 72 0 0 5 184 251 103 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 152 254 170 0 0 0 148 254 254 85 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 21 225 254 41 0 30 198 253 254 88 2 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 25 231 254 67 22 181 251 246 108 20 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 37 241 254 146 240 254 238 62 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 41 226 254 254 254 192 21 3 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 91 229 254 246 120 16 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 56 248 254 254 190 15 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 92 249 229 131 213 254 136 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 69 233 248 65 0 178 254 143 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 9 73 252 248 88 0 0 84 249 180 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 44 254 246 50 0 0 25 210 247 86 0 0 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 25 213 254 84 0 2 82 223 254 203 0 0 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 63 254 254 4 33 168 254 254 176 5 0 0 0 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 31 225 254 227 240 254 235 133 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 42 193 254 254 222 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Interesting... Perhaps I can "show this array" somehow!¶

In [8]:
import matplotlib.pyplot as plt

plt.imshow(simple_image, cmap='gray')
plt.show();

Let's look at some LANDSAT 8 Data¶

Bands 2, 3, and 4 are visible blue, green, and red. Source: https://landsat.gsfc.nasa.gov/landsat-8/landsat-8-bands/

Much of this is referenced from: https://automating-gis-processes.github.io/CSC/lessons/L5/overview.html

In [9]:
import urllib

# https://automating-gis-processes.github.io/CSC/notebooks/L5/reading-raster.html
raster_url = 'https://github.com/Automating-GIS-processes/CSC18/raw/master/data/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif'
helsinki_local_file = 'data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif'

urllib.request.urlretrieve(raster_url, helsinki_local_file)
Out[9]:
('data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif',
 <http.client.HTTPMessage at 0x16a7d0fa0>)

Rasterio¶

Rasterio: access to geospatial raster data

Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.

Source: https://rasterio.readthedocs.io/en/latest/index.html

In [10]:
import rasterio

dataset = rasterio.open(helsinki_local_file)
dataset
Out[10]:
<open DatasetReader name='data/raster/Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif' mode='r'>
In [19]:
plt.figure(figsize=(8, 8))
plt.imshow(dataset.read(1), cmap='Blues')
plt.title('Band 1 - Blue');

How many Bands?¶

In [14]:
dataset.count 
# number of raster bands
Out[14]:
7
In [18]:
from matplotlib import pyplot
from rasterio.plot import show

fig, (axr, axg, axb) = pyplot.subplots(1,3, figsize=(21,7))
show((dataset, 3), ax=axr, cmap='Reds', title='red channel')
show((dataset, 2), ax=axg, cmap='Greens', title='green channel')
show((dataset, 1), ax=axb, cmap='Blues', title='blue channel')
pyplot.show()
In [16]:
from rasterio.plot import show_hist

show_hist(
    dataset, bins=50, lw=0.0, stacked=False, alpha=0.3,
    histtype='stepfilled', title="Histogram of Bands")
In [20]:
dataset.bounds
Out[20]:
BoundingBox(left=698592.0, bottom=6656859.0, right=735300.0, top=6697870.5)
In [21]:
'X - ', dataset.bounds[0], dataset.bounds[2], '-', 'Y - ', dataset.bounds[1], dataset.bounds[3]
Out[21]:
('X - ', 698592.0, 735300.0, '-', 'Y - ', 6656859.0, 6697870.5)
In [22]:
np.mean([dataset.bounds[0], dataset.bounds[2]]), np.mean([dataset.bounds[1], dataset.bounds[3]])
Out[22]:
(716946.0, 6677364.75)

Raster Operations¶

Raster Math¶

raster-math

Source: https://desktop.arcgis.com/en/arcmap/10.7/tools/spatial-analyst-toolbox/cell-statistics.htm

Sampling Raster Values¶

In [24]:
sample_coordinates = [
    (716946.0, 6677364.75),
    (713046.0, 6676164.75),
    (714846.0, 6666164.75),
    (720846.0, 6668164.75),
]

samples = [x for x in dataset.sample(sample_coordinates)]
In [25]:
samples
Out[25]:
[array([ 65,  48,  34,  44,  41, 128,  27], dtype=uint8),
 array([ 85,  70,  69,  49,  78, 125,  61], dtype=uint8),
 array([ 55,  34,  25,  10,  10, 105,  11], dtype=uint8),
 array([ 59,  34,  28,  11,  11, 104,  10], dtype=uint8)]

Zonal Statistics¶

img

Source: https://gisgeography.com/zonal-statistics/

Zonal statistics summarizes the cells for a given region or line.

In [27]:
from shapely.geometry import Polygon
Polygon(sample_coordinates)
Out[27]:
In [28]:
from shapely.geometry import mapping, Polygon
import fiona

poly = Polygon(sample_coordinates)

schema = {'geometry': 'Polygon'}

with fiona.open('data/sample_coordinates.shp', 'w', 'ESRI Shapefile', schema) as c:
    c.write({'geometry': mapping(poly)})
    
# save a shapefile with fiona
In [29]:
import geopandas as gpd

fig, ax = plt.subplots(figsize=(10, 10))

p = gpd.GeoSeries(poly)
p.plot(ax=ax, facecolor='none', edgecolor='red')

rasterio.plot.show(dataset, ax=ax)
plt.show()
In [30]:
from rasterstats import zonal_stats

Band 1¶

In [31]:
stats = zonal_stats('data/sample_coordinates.shp', helsinki_local_file, band=1, stats=['min', 'max', 'median', 'majority', 'sum'])
stats
Out[31]:
[{'min': 27.0,
  'max': 252.0,
  'sum': 4118148.0,
  'median': 61.0,
  'majority': 60.0}]

Band 6¶

In [32]:
stats = zonal_stats('data/sample_coordinates.shp', helsinki_local_file, band=6, stats=['min', 'max', 'median', 'majority', 'sum'])
stats
Out[32]:
[{'min': 103.0,
  'max': 160.0,
  'sum': 6997422.0,
  'median': 106.0,
  'majority': 105.0}]
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Folium¶

In [33]:
import folium 

m = folium.Map(
    location=[45.372, -121.6972],
    zoom_start=3,
    tiles='Stamen Terrain')

folium.Marker(
    [63.0690, -151.0063], 
    popup='<i>Denali</i>',
).add_to(m)

folium.Marker(
    [36.5786, -118.2920], 
    popup='<b>Mount Whitney</b>',
).add_to(m)

m
Out[33]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]:
 
In [ ]:
 

Additional Resources¶

  • CSC Finland – IT Center for Science - Welcome to Introduction to Python GIS -course 2018 - https://automating-gis-processes.github.io/CSC/index.html
In [ ]: